Distance-based clustering using QUBO formulations

In computer science, clustering is a technique for grouping data. Ising machines can solve distance-based clustering problems described by quadratic unconstrained binary optimization (QUBO) formulations. A typical simple method using an Ising machine makes each cluster size equal and is not suitable for clustering unevenly distributed data. We propose a new clustering method that provides better performance than the simple method, especially for unevenly distributed data. The proposed method is a hybrid algorithm including an iterative process that comprises solving a discrete optimization problem with an Ising machine and calculating parameters with a general-purpose computer. To minimize the communication overhead between the Ising machine and the general-purpose computer, we employed a low-latency Ising machine implementing the simulated bifurcation algorithm with a field-programmable gate array attached to a local server. The proposed method results in clustering 200 unevenly distributed data points with a clustering score 18% higher than that of the simple method. The discrete optimization with 2000 variables is performed 100 times per iteration, and the overhead time is reduced to approximately 20% of the total execution time. These results suggest that hybrid algorithms using Ising machines can efficiently solve practical optimization problems.


Distance-based clustering using QUBO formulations
Nasa Matsumoto 1 , Yohei Hamakawa 2 , Kosuke Tatsumura 2 & Kazue Kudo 1,3* In computer science, clustering is a technique for grouping data. Ising machines can solve distancebased clustering problems described by quadratic unconstrained binary optimization (QUBO) formulations. A typical simple method using an Ising machine makes each cluster size equal and is not suitable for clustering unevenly distributed data. We propose a new clustering method that provides better performance than the simple method, especially for unevenly distributed data. The proposed method is a hybrid algorithm including an iterative process that comprises solving a discrete optimization problem with an Ising machine and calculating parameters with a general-purpose computer. To minimize the communication overhead between the Ising machine and the generalpurpose computer, we employed a low-latency Ising machine implementing the simulated bifurcation algorithm with a field-programmable gate array attached to a local server. The proposed method results in clustering 200 unevenly distributed data points with a clustering score 18% higher than that of the simple method. The discrete optimization with 2000 variables is performed 100 times per iteration, and the overhead time is reduced to approximately 20% of the total execution time. These results suggest that hybrid algorithms using Ising machines can efficiently solve practical optimization problems.
Many combinatorial optimization problems can be described by the Ising model or quadratic unconstrained binary optimization (QUBO) formulations 1 . Ising machines, which are special-purpose computers for solving combinatorial optimization problems, have attracted significant interest in recent years. Inspired by the first quantum annealer 2 , several devices have been developed, such as digital processors based on simulated annealing (SA) [3][4][5][6][7][8][9] , those on simulated bifurcation (SB) [10][11][12] , coherent Ising machines implemented with pulsed lasers [13][14][15][16][17][18] , and other types of optical Ising machines [19][20][21] . Most Ising machines accept an objective function in the form of a Hamiltonian formulated by the Ising model or QUBO formulation. Ising machines return binary solutions that minimize the objective function, although they do not always return optimal solutions because of their heuristic nature. Recent research on the application of Ising machines has shifted from simple combinatorial optimization to hybrid methods using both an Ising machine and a general-purpose computer [22][23][24][25][26][27] . Most hybrid methods are iterative methods that offload the sampling or combinatorial optimization step to an Ising machine.
Clustering is a technique for grouping data such that the members in each cluster have similar characteristics. Although there are various types of clustering, this work focuses on non-hierarchical and distance-based clustering. Because clustering can be formulated as a combinatorial optimization problem, several algorithms using Ising machines have been developed [28][29][30][31][32][33][34][35][36] . Clustering methods using a quantum computer and a quantum annealer have also been proposed and investigated 37,38 . A typical simple method using an Ising machine minimizes the distances between data points in the same cluster, resulting in cluster sizes that are approximately equal 34,35 . Let us suppose that a small group is away from other large groups. Then, using this method, part of a large group is merged into a small group such that the number of members in each group becomes almost equal. This implies that this method is not suitable for clustering unevenly distributed data.
In this work, we propose a clustering method using an Ising machine that applies to unevenly distributed data. We compare the clustering of two data sets, one uniformly distributed and the other unevenly distributed, using the simple method and the proposed method. Employing the average silhouette coefficient, we evaluate clustering performance. The proposed method provides better clustering results than the simple method, especially for unevenly distributed data. The proposed method is a hybrid algorithm that solves discrete optimization problems iteratively. The discrete optimization described in a QUBO formulation is performed by an Ising machine, while a general-purpose (conventional) computer calculates parameters for the QUBO formulation in each iteration.
The proposed method could be implemented in one of three possible ways: (i) discrete optimization executed on a remote Ising machine provided as a cloud service, (ii) discrete optimization undertaken using a local Ising www.nature.com/scientificreports/ machine, and (iii) all computations undertaken on a local general-purpose computer. The first approach incurs a high communication cost with the Ising machine, and so is unsuitable for hybrid algorithms. The second implementation option offloads the computation of discrete optimization steps to the accelerator attached to a local server. This approach has the benefit of a much lower communication cost with the Ising machine than the first approach. In this work, we compare the second and third implementations. The Ising machine used is an SB-based machine implemented with a field-programmable gate array (FPGA) 12 . The low latency of the SB-based machine is advantageous for executing the proposed method. Because the method requires the iterative computation of discrete optimization, a low communication cost between an Ising machine and a general-purpose computer is essential for high performance. This work demonstrates that the proposed hybrid method using an Ising machine provides high-quality results in clustering problems. The method's effectiveness suggests a new usage that takes advantage of a lowlatency Ising machine. Moreover, this work will highly motivate developing iterative hybrid algorithms. Although hybrid algorithms using an Ising machine and a general-purpose computer take extra time for communication, they have many practical applications. The reduction in communication cost enables iterative hybrid algorithms to solve practical optimization problems efficiently. The proposed method will also be applicable as a quantumclassical hybrid algorithm when the communication cost reduces significantly.

Related work
We briefly review some recent works on the clustering method using Ising machines and quantum computers. Simple distance-based clustering is a typical example using an Ising machine or a quantum computer. Simple clustering methods based on the distance between data points 34,35,37 and graph partitioning 33 using a quantum annealer still attract some interest. Several methods inspired by classical clustering have been proposed and developed recently. For example, quantum-assisted clustering based on similarities 28 , K-means 29-31 and K-Medoids 32 clustering on a quantum annealer, as well as K-means clustering on a gate-model quantum computer 38 , have been proposed. K-means-like clustering on a digital Ising machine has also been investigated 36 .

Models and algorithms
We examine two clustering methods. One is the typical simple method in which the Hamiltonian is given as where d ij is the distance between points i and j, N is the number of points, and G is the number of groups. The sum i<j is taken over all combinations that satisfy 1 ≤ i < j ≤ N . Here, we refer to this method as the simplecost method. When point i belongs to group g, x i,g = 1 ; otherwise, x i,g = 0 . The first term of the Hamiltonian sums up all the distances between point pairs in each group. The second term of the Hamiltonian requires each point to belong to exactly one group, and α is a positive constant. This term gives a penalty when a point belongs to more than two groups or does not belong to any group. Because the number of point pairs in each group dominates in the Hamiltonian, each group tends to have almost the same number of points.
To resolve the issue, we propose another Hamiltonian described by x i,g is the number of points in group g. The number of nonzero terms in i<j d ij x i,g x j,g equals the number of point pairs in group g, i.e., N g ( ] is the average distance between the point pairs in group g, where a factor of 2 is omitted for simplicity. In other words, the first term on the right-hand side of Eq. (2) expresses the sum of the average distances between the point pairs in each group. The dominance of the number of point pairs in each group in Eq. (1) is dissolved in Eq. (2). It will be experimentally shown later (in Fig. 4) that this formulation works well, especially for unevenly distributed data points.
Ising machines cannot directly minimize non-QUBO formulations such as Eq. (2). Instead, we employ a hybrid algorithm in which another Hamiltonian described by a QUBO formulation is iteratively minimized. We refer to this method as the iterative fractional-cost method.
The iterative fractional-cost method originates from the hybrid parametric method proposed to solve the vehicle routing problem 24 . The hybrid parametric method is an extension of an inexact parametric algorithm to solve fractional programming problems 39 . Instead of minimizing the original fractional objective function, the hybrid method iteratively solves the corresponding parametric problem in the discrete-optimization step, using a quantum annealer or an Ising machine.
The algorithm of the iterative fractional-cost method is as follows.
1. Set the error parameter δ , the iteration counter n as n = 0 , and parameter as an initial value 0 = 0. 2. Minimize the following Hamiltonian with an Ising machine.
Otherwise, return to step 2 with n → n + 1 while n + 1 < n max . Alternatively, terminate with no solution when n + 1 = n max .
If the algorithm successfully terminates, n+1 is expected to give the minimum cost. Because of the heuristic nature of the Ising machine, n+1 does not always coincide with the global optimal solution. However, it is proved that the inexact parametric method converges to a global optimum if the obtained solution at the discreteoptimization step (step 2) is a good approximation 39 .
The key point of the iterative fractional-cost method is the use of an Ising machine to minimize Eq. (3), which is a QUBO formulation, instead of the fractional-cost Hamiltonian, Eq. (2). In the simulation below, we take the error parameter as δ = 10 −6 and the maximum iteration number as n max = 10.

Results
We apply the two methods to two kinds of data sets. One is the set of points distributed uniformly in a square region. The other is the set of unevenly distributed points in the same size region. Each set comprises 200 points ( N = 200 ), and we take the number of groups as G = 10 here, so that there are 2,000 binary variables in total. The data sets are provided as Supplementary Information.
Simple-cost method. In the simple-cost method, we minimize Eq. (1) using the SA or SB. Figure 1 exhibits clustering results, where points with the same color belong to the same group. Each panel shows a result selected to demonstrate the best performance of the method. We obtained Fig. 1a,c using SA, and Fig. 1b,d using SB. We find almost no difference between Fig. 1a and Fig. 1b, with little difference between Fig. 1c and Fig. 1d. For the unevenly distributed data, an apparent cluster is divided into two groups, and some points in apparent clusters are classified as other group members.
Solutions provided by SA or SB are not always valid because they sometimes violate the constraint that each point should belong to exactly one group. We call the solution satisfying this constraint a feasible solution. Here, we perform the simulation 100 times for each data set, using either SA or SB for the given time steps. The ratio of feasible solutions among the solutions obtained in the 100 trials is the feasible solution rate shown in Fig. 2. The horizontal axis in Fig. 2 represents the number of time steps. The larger the number of steps, the slower the SA or SB process. Here, the hyperparameter values are α = 5.5 and α = 6.0 for SA and SB, respectively, for the data set with a uniform distribution; α = 5 for SA and SB for that with an uneven distribution. In Fig. 2, the feasible solution rate is almost 100%, except for the region where the number of steps is relatively small. However, a low feasible solution rate does not always result in an inferior result. By changing the value of α , we obtained similar clustering results even though feasible solution rates were 40-80%.
For evaluation of the clustering result, we introduce the silhouette coefficient 40 . Because the silhouette coefficient is defined for each data point, we take the average of every point's silhouette coefficient in a feasible solution. The better the clustering result, the higher the average silhouette coefficient.  Fig. 3a,b], while it is approximately 0.45-0.6 or higher for the unevenly distributed one [Fig. 3c,d]. The difference reflects the difference in data distribution. On the other hand, the difference between SA and SB is significant. Although the execution time is much shorter for SB than for SA, the average silhouette coefficient for SB is almost equal or even higher. Moreover, the variance of the average silhouette coefficient is small for SB in the long-time region.
Iterative fractional-cost method. We use only SB to perform discrete optimization in the iterative fractional-cost method. Because the simulation using SA had a longer run time and provided similar or worse results than SB in the simple-cost method, we cannot expect SA to provide better results than SB in the iterative fractional-cost method. Figure 4 exhibits examples of clustering results in the iterative fractional-cost method. Compared with Fig. 1, unevenly distributed data points are well classified. Figure 5 illustrates the final feasible solution rates. Here, the simulation was run 50 times for each data set. In other words, the final feasible solution rate is the ratio of feasible solutions obtained at the end of the algorithm among the 50 trials. The number in the legend represents the number of iterations of steps 2-4 in the iterative fractional-cost method before the final solution is obtained. In this work, we perform optimization sampling 100 times at the discrete optimization step (step 2) to obtain several feasible solutions for Eq. (3). The optimal solution among them is x , which evaluates n+1 in Eq. (4). Even if x is feasible at each iteration, this method sometimes fails to obtain a final solution within n max = 10.  Fig. 5, the final feasible solutions are obtained in three iterations. More iterations are required to reach a final feasible solution in the small-number-of-step region for the uniformly distributed data, as shown in Fig. 5a,c. The simulation for the unevenly distributed data requires more iterations for a large number of steps, as shown in Fig. 5b,d. The discrete optimization step works better when the number of steps is large compared to when it is small, which can cause overfitting. If the solution is overfitted to a temporal parameter n in each iteration, it may become difficult for the algorithm to converge. However, controlling the hyperparameter α can result in the simulation requiring fewer iterations.  The average silhouette coefficient is almost independent of α and the number of SB time steps in the region corresponding to Figs. 5 and 6. For the uniformly distributed data, the average silhouette coefficient is 0.392-0.394, which is comparable to that of the simple-cost method. However, for the unevenly distributed data, the average silhouette coefficient is 0.709-0.716, approximately 18% higher than that of the simple-cost method.

Discussion
In this work, we have compared two clustering methods based on QUBO formulations. One is the simple-cost method, and the other is the iterative fractional-cost method. We have applied each method to data sets with uniform and uneven distributions. The simulation of the simple-cost method is performed using SA and SB, whereas that of the iterative fractional-cost method is performed using only SB.
Clustering results highly depend on data distribution. For an uneven distribution, the iterative fractionalcost method works better than the simple-cost method. However, the iterative fractional-cost method requires   www.nature.com/scientificreports/ more time to obtain a result because it requires several iterations. If the variation in cluster size is significant, the iterative fractional-cost method is a better choice. Otherwise, the simple-cost method is acceptable. The difference between SA and SB is significant, especially in execution time. The difference in execution time arises mainly from the parallelization of an algorithm. In this work, SA is based on the single spin-flip Monte Carlo method and is out of parallelization. However, SB is executed through massively parallel processing. The parallelization is the key to the acceleration of finding reasonable solutions.
Another reason why SB outperformed SA in execution time is that the accelerator made of an FPGA is attached to the local server. Even if a remote Ising machine or a quantum annealer solves QUBO problems faster than SA, the communication time between the machine and the local server may cancel the acceleration. The results shown above imply that the acceleration by the SB-based machine overcame the communication overhead between the accelerator and the general-purpose processor, i.e., CPU, in the local server.
This work paves the way for solving practical optimization problems requiring iterative hybrid algorithms in a reasonable time by using an Ising machine. For most present Ising machines, including quantum annealers, iterative hybrid algorithms are relatively inefficient, mainly because of communication overheads. However, iterative hybrid methods using a high-speed Ising machine with low latency can provide high performance in solving practical optimization problems. The present work demonstrates such an example.

Methods
Execution time. In this section, we define one-shot execution time as the time required to solve a discrete optimization problem. The detailed definition differs between SA and SB because different architectures were used for each method. For SA, we used the dwave-neal package 41 , which is an SA software, and measured the time from calling the SA sampler to obtaining solutions. When the SA sampler executes SA n reads times per call, the one-shot execution time is the measured time divided by n reads . In SB, the one-shot execution time, which is measured for each call, includes the FPGA calculation time and the communication time between the CPU and FPGA.
The execution time for the simple-cost method (shown in Fig. 3) is the same as the one-shot execution time. On the other hand, in the iterative fractional-cost method, the execution time shown in Fig. 6 is defined as the sum of the one-shot execution times. As shown in Fig. 7, only the discrete optimization step (step 2) is executed in the FPGA. The step is iterated several times, and 100 shots of SB are executed at each iteration. Thus, the execution time for the iterative fractional-cost method is several hundred times longer than that for the simplecost method.
The overhead time in SB is a minor component of execution time. For example, as shown in Fig. 3b, the execution time for 2000 SB time steps is approximately 14 ms. The communication time between the CPU and FPGA, which is independent of the number of SB time steps, is approximately 3 ms. Thus, on average, the overhead time is approximately 20% of the execution time for the problem considered in this work.
Ballistic simulated bifurcation. The SB algorithm, which is a quantum-inspired algorithm, was proposed to accelerate combinatorial optimization [10][11][12] . The SB is based on adiabatic evolution in classical nonlinear systems showing bifurcations. The SB algorithm finds a spin configuration minimizing the Ising model energy defined by where s i = ±1 is the ith spin, N is the number of spins, J i,j is the coupling coefficient between the ith and jth spins, and h i is the local field on the ith spin.
SB has several variants, the first of which was adiabatic SB (aSB). The set of equations for aSB is given by www.nature.com/scientificreports/ where x i and y i are real numbers corresponding to the ith spin, a 0 , c 0 and η are positive constants, and a(t) is a control parameter that increases from zero to a 0 . Equations (6) and (7) express the equations of motion of the classical particle corresponding to the ith spin: x i and y i represent the position and momentum of the ith particle, respectively. Classical particles interact in a potential whose shape gradually changes. The sign of x i at the end of the time evolution gives s i as the solution to the problem described by Eq. (5). Because x i is a continuous variable, even though spin s i = ±1 is discrete, analog errors arise in the aSB. The ballistic SB used in this work was developed to suppress analog errors 12 . Instead of the nonlinear term x 3 i in Eq. (7), perfectly inelastic walls are introduced at x i = ±1 . Using the symplectic Euler method, we numerically solve the set of equations of motion for bSB, whose updating rule is as follows 12 .
where t k is the kth time step, and t is the time-step width. Namely, t k+1 = t k + t . Here, we take t 0 = 0 . At each time after the update of x i , if |x i | > 1 , we replace x i with sgn(x i ) = ±1 and set y i = 0.
Parameters. The hyperparameter α in Eqs. (1) and (3)  In SA, the parameter controlling the annealing time is the number of sweeps or steps, called num_sweeps in the dwave-neal package. In SB, the number of steps n steps controls the increasing rate of a(t), that is, a(t k ) = (1 − k/n steps )a 0 . In this work, we set c 0 = a 0 / max , where max is the maximum eigenvalue of matrix J ij , and a 0 = η = 1 and t = 0.5 in Eqs. (8) and (9).
Average silhouette coefficient. The silhouette coefficient is defined for each data point, and it is higher when the point belongs to a well-defined cluster. The silhouette coefficient for point i is defined by where a(i) is the mean distance between point i and the other points in the same group, and b(i) is the mean distance between point i and points in the neighboring group. However, if a point belongs to several groups or does not belong to any group, the silhouette coefficient cannot be properly calculated. Therefore, we calculate the silhouette coefficient for a feasible solution in which each point belongs to exactly one group and take the average over all the points in the feasible solution.
Received: 12 October 2021; Accepted: 2 February 2022 x i (t k+1 ) = x i (t k ) + a 0 y i (t k+1 )� t , License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.